perm filename TAVARE.TEX[EXM,TEX] blob sn#601116 filedate 1981-07-17 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00002 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	\input basic
C00023 ENDMK
C⊗;
\input basic
\magnify{1300}
\null \vfill
\def\footnotetypeface{\:d }



\ctrline{Detecting Particular Genotypes in Populations under
Nonrandom Mating}
 
\vskip 20pt
\ctrline{S. Karlin and S. Tavare}
\vskip 60pt
\parskip 5pt
\def\yyskip{\penalty-100\vskip6pt plus6pt minus4pt}

\noindent $↑1$Department of Mathematics, Stanford University
Stanford, California 94305
  
\noindent $↑2$Department of Statistics, Colordao State 
University, Fort Collins, Colorado 80523
\par\vfill\eject

\yyskip
\noindent {\bf Summary}
  
We investigate the time to formation of particular genotypes in 
populations under {\it non-random} mating systems.  We employ
two main techniques.  The first is a study of branching processes
with `killing'; these are models which behave just like a standard
Galton-Watson branching process with the added possibility of being
terminated by the occurence of a special event in the process.  In
our case, this special event corresponds to formation or detection
of a group of individuals carrying a specific genotype.  We then use
these results and some natural approximation methods to analyze and 
interpret the gene formation problem in a simple way.

\yyskip
\noindent {\bf Introduction}
 
In this paper, we continue the study of the formation time of 
particular genotypes in finite populations.  This question, first
addressed by Robertson (1978), arose primarily in the context of
artificial selection and breeding schemes.  Although the results also
pertain to evolutionary theory and medical genetic counseling, since
they lend insight into the process of gene formation as a result of 
mutation or crossing over.
 
The basic framework is the following:  In a population comprising $N$
individuals, each classfied as one of three possible genotypes 
$AA, Aa, aa$, how long does it take to produce or form the first
$aa$-genotype under different mating and selection regimes?  Under
the assumption of a classical Wright-Fisher reproduction scheme
(cf. 1979, ch.6), Robertson ascertained by matrix numerical methods and
simulation that if the popopulation comprised initially of one 
heterozygote (the remaining individuals being $AA$) then the time to
formation (and detection, since the $aa$-genotype is assumed visible
on formation) of the first $aa$ individuals takes about $2.08N{↑1/3}$
generations.  By comparison, the time to fixation of the $a$-allele,
given that it is a new mutant destined for fixation, is about $4N$
generations; Kimura (1970).  Robertson's original model, and a variety
of extensions have been analyzed by the authors p1980, 1981a,b) using
the method of diffusion approximation to the underlying Markov chain
models.  The novelty in the analysis lies in the use of diffusion
processes with killing, this killing term deriving from the appearance
of the new visible genotypes.  For example, the time to detection of
$aa$-types corresponds to the killing time of the diffusion process.
Using this approach, we were able to confirm and extend Robertson's
results, and provide a natural way to analyze formation or detection
times.
 
Implicit in these analyses is the appearance of an approximation
process that is continuous both in its time scale and its state space.
However, under a variety of reproduction schemes typically involving
non-random mating or strong selection effects, the approximating
process is no longer continuous in this way, but is {\sl discrete}
both in time and state space.  It is these processes we want to 
discuss in this paper.
 
The layout of the paper is as follows.  In Section I, we describe
the simplest model of a branching process with `killing' corresponding
to the detection in the population of individuals that are `defective'
in some way.  Some elementary properties of such a process are derived;
for an introduction to the properties of standard branching processes,
the reader might consult Karlin and Taylor (1975, Chapt.8).  As an
example the results are exhibited explicitly for the linear fractional
process.  As will be seen, processes of this sort arise in the later
sections as approximations to our genetic processes, and so we highlight
their behavior at the beginning.
 
Section II discusses the detection problem in the case of selfing
shemes where we assess the effects if selection and incomplete 
penetrance in heterozygotes on the time to formation of the $aa$-genotype.
We also discuss the behavior of a sex-linked system (cf. James, 1979) and
a case involving a model for Becker's muscular dystorphy (cf. Gladstein
and Lange (1978)).  In Section III we focus attention on models which
lead to multi-dimensional branching processes.  Among these, we analyze
the effects of imperfect visibility of the $aa$-genotype as soon as it is
formed).  We also study sib-mating and parent-offspring mating systems.
 
Although our primary interest is in the genetic applications, we include
a brief appendix on the nature of the approximation methods used in
Section II.

\yyskip 
{\noindent \bf I. Branching Processes with Killing}
 
In this section we will develop briefly some results concerning one-
dimensional Galton-Watson branching processes with killing.  The results
and methods derived here are used to analyze the approximating processes
described in Section II.  Motivated by these applications, we will use
the term {\it detection} interchangeably with the word {\it killing} so
no confusion will arise.
 
We consider a population of individuals that reproduces in the following
way.  Each individual alive at a particular time produces, independently
of the others alive at that time, a random number of offspring, each
with the distribution of a random variable $Z$ satisfying 
$$Pr\{ z = k\} = p↓k, \quad k ≥ 0; \quad p↓0, \quad p↓0 + p↓1 > 1\eqno(1)$$
An offspring born to a particular individual has a particular individual
has a probability $1 - \alpha$ of being found to be defective in some
way.  We assume in this simplest case that detection of defectives is
independent over all individuals in a family, and over families.  To
avoid trivialities, we also will assume that $0 < \alpha < 1$.
A family of size $k$ survives to reproduce if, and only if, all $k$
individuals are normal.  This has probability $p↓k\alpha↑k, k ≥ 0$.
It follows that if $f(s) = \sum ↓{k ≥ 0} p↓k s↑k$ is the p.f.g.
of $Z$, then the (defective) p.g.f. of the number of offspring born
to an individual with no defective offspring is $$g(s) = f(\alpha s)\eqno(2)$$
and $1 - g(1) = 1f(\alpha)$  is the probability that family contains any
defective individuals.  The population now evolves as follows.  Let
$X↓n$ be the number of individuals alive at time $n$.  The population
continous to the next generation only if no defective individuals are born.
Otherwise, we say the process has ended by a kill
ing (or detection) event.
Under the simple detection scheme introduced above, it is in principle
straightforward to analyze the process.  We take $x↓0 = 1$, and define
the iterates of $g(\bullet )$ by $$g↓0(s) = s, \qquad g↓n(s) = 
g(g↓{n-1}(s)), \qquad n ≥ 1.\eqno(3)$$

Intuitively, it is clear that the process ends either in extinction or in
detection.  Let $q$ be the probability that extinction prevails.  Using
figure 1, it is simple to show that if $9 < \alpha < 1$, then $q$ is the
unique root satisfying $0 < q < 1$ of the functional equation
$g(s) = f(\alpha s) = s$; if $x↓0 = i > 1$, then the extinction probability
is $q↑i$. (figure 1 about here)
 
The probability that the detection time $T↓D$ is greater thn $n$ is given
by $Pr\{ T↓D F n\} = Pr\{ 0 ≤ x↓n \} = g↓n(1)$.  Since $g↓n(s)$ is decreasing
in $n$ for $s \in (q,s↓1)$, where $s↓1$ is the larger root of $f(\alpha s) = s$
satisfying $s↓1 > 1$, we conclude that $g↓n(1) → q$ as $n → ∞$.  This 
establishes that indeed the process terminates either by detection or
extinction.  The probability that detection prevails is then $1 - q↑i$ if
$x↓0 = i$.  
 
Two relevant distributions in the study of detection times are the 
(conditional) detection time $T↓D$, and $T = \min (T↓0,T↓D)$, the time to
extinction or detection.  We have $$Pr[T↓D F↓n\relv T↓D < ∞] =
{g↓n(1) -q \over 1 - q}, \qquad n ≥ 0 \eqno(4)$$ and
$$Pr[T > n] = g↓n(1) - g↓n(0), \qquad n ≥ 0. \eqno(5)$$
 
We can establish the asymptotic behavior of (4) and (5) as follows.
Let $\gamma = g\prime (q) = \alpha f\prime (\alpha q) < 1$.  A modification
of the proof provided by Athreya and Ney (1972, p. 38-41) establishes
the existence of $$Q(s) = {lim↓{n → ∞}} \gamma↑{-n}(g↓n(s) - g),
\qquad 0 ≤ s < s↓1, \eqno(6)$$ where $Q(s)$ satisfies the functional
equation $Q(g(s)) = \gamma Q(s)$, subject to $Q(q) = 0$, $Q\prime (q) = 1$,
and $Q\prime (s) > 0$ for every $s \in [0,s↓1)$.  It follows immediately
that $Pr[T > n] = 0(\lambda ↑n)$, and $Pr[T↓D > n\relv T↓D < ∞] = 
0(\lambda ↑n)$, $n → ∞$.  Hence, both $T$ and $T↓D$ (conditioned on
$T↓D < ∞)$ have finite moments of all orders.  We can use the result
in (6) to establish other interpreting asymptotic properties of the
transition probabilities.  We give one example involving the asymptotic
conditional distribution.
 
Fix $X↓0 = 1$, and set $a↑{(n)}↓j = Pr[X↓n = g\relv T > n]$ and define
$\varphi ↑{(n)}(s) = \sum↓{j=1}↑∞ a↓j↑{(n)}s↑j, \quad \relv s\relv = 1$.
Using (4) and (5), we find that $\varphi↑{(n)}(s) = {g↓n(s) - g↓n(0)\over
g↓n(1) - g↓n(0)}$.  Now use (6) to see that $$\varphi↑{(n)}(s) =
{\gamma↑{-n}(g↓n(s) - q) + \gamma↑{-n}(q - g↓n(0)) \over
\gamma↑{-n}(g↓n(1) - q) + \gamma↑{-n}(q - g↓n(0))} → {Q(s) - Q(0) \over
Q(1) - Q(0)} \quad \hbox{as } n → ∞. \eqno(7)$$  The right hand side
of (7) is the probability generating function of the asymptotic
conditional distribution $a↓j = lim↓{n→∞} a↓j↑{(n)}$.
 
Loosely speaking we interpret $a↓j$ as follows.  If the process has
been running for a long time, and neither detection nor extinction
has occurred, then $x$ is in state $j$ with probability $a↓j$.  The
mean of this asymptotic conditional distribution is 
$\sum↓{j≥1} ja↓j = {Q↑{\prime} (1) \over (Q(1) - Q(0))}$.
 
There are essentially only two cases where explicit forms for the
iterates of $g(\cdot )$ are available.  One is the trivial case
$f(s) H = p + (l-p)s, \quad 0 < p < 1$ the other being the linear
fractional p.g.f. $$f(s) = {r + s(1-r-p) \over 1 - ps}, \quad
0 < r < 1, \quad 0 < p < 1, \quad p+r ≤ 1. \eqno(8)$$ which corresponds
to $$p↓k = \left\{\vcenter{\halign{\lft{$#$}\cr
r \qquad k = 0\cr
(1-r)(1-p)p↑{k-1}, \qquad k ≥ 1\cr}} \right.
\eqno(9)$$  In this latter case, we have $$g(s) = r+s\alpha (l-r-p)
\over 1-p\alpha s. \eqno(10)$$  Let $0 < s↓0 < 1 < s↓1$ be the roots
of the equation $g(s) =s$, and define $$K = {1 - p\alpha s↓1
\over 1 - p\alpha s↓0}
. \eqno(11)$$  It is readily checked that
$0 < K < 1$, and using a standard method for iterating a liner
fractional (cf. Karlin and Taylor (1975), p. 403), one obtains
$$g↓n(s) = {s↓0s↓1(K↑n-1) + s(s↓0-K↑ns↓1) \over (K↑ns↓0-s↓1)
-s(K↑n-1)}, \quad 0 ≤ s < s↓1. \eqno(12)$$  It follows immediately
from (1.11) that $\hbox{\bf{P}} (T↓0 ≤ n) = g↓n(0) → s↓0$ as $n → \infty$,
so that $q = \hbox{\bf{P}} (T↓0 < \infty ) = s↓0$.  Further, from (4),
$$\hbox{\bf{P}} [T↓D > n|T↓D < \infty ] = {K↑n(s↓1-s↓0) \over
K↑n(l-s↓0) + (s↓1-1)}. \eqno(13)$$  In this case, $K = \gamma$, and
so the function $Q(\cdot )$ is given by (11) as
$$Q(s) = {(s↓1-s↓0)(s↓0-s) \over (s-s↓1)}; \quad 0 ≤ s < s↓1. \eqno(14)$$
The asymptotic conditional distribution $\{ a↓j \}$ specified at (1.7)
has generating function ${Q(s) - Q(0) \over Q(1) - Q(0)} =
{s(1-s↑{-1}↓1) \over 1 - s↑{-1}↓1 s} ; \quad 0 ≤ s < 1$, showing
that $\{ a↓j \}$ is the geometric distribution
$$a↓j = (1 - s↑{-1}↓1)s↑{-j}↓1, \quad j ≥ 1, \eqno(15)$$ with
mean $(1 - s↑{-1}↓1)↑{-1}$.
 
Another pertinent conditioning focuses attention on those paths that
become extinct, rather than those that end in detection.  The 
process $\{ \s x↓n, \quad n ≥ 0 \}$ that arises by conditioning
on extinction is again a branching process, with offspring p.f.g.
given by $\s f (s) = {g(sq) \over q} = {p \alpha s↓1 + s(1-
p \alpha (s↓1 + s↓0)) \over 1 - sp \alpha s↓0}, \quad 0 ≤ s < 1$.
The process is subcritical (as it mut be!) with offspring mean
$\s f↑{\prime} (1) = K < 1$.
 
For comparison with the paths allowing both extinction and detection
, we will evaluate the asymptotic conditional distribution of $\s x↓n$.
Let $\s T↓0$ be the time to hit $\{ 0 \}$.  Then it is straightforward
to show that if $\s x↓0 = 1$, $$\s a↓j = lim↓{n→ \infty }
\hbox{\bf {P}} [\s X↓n = j| \s T↓0 > n] = \left( 1-s↓0\over s↓1 \right)
\left( s↓0\over s↓1 \right) ↑{j-1}, \quad j ≥ 1. \eqno(16)$$
This distribution has mean $ \left( 1-s↓0 \over s↓1 \right)↑{-1}$.
Comparing this with the result of (15), we see that the mean number
of individuals, conditional on the process being `alive', is rather
lower in this last case than when detection is also allowed.
 
\vfill \eject \end